Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Evol Bioinform Online ; 20: 11769343241239463, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38532808

RESUMO

Mycobacterium tuberculosis (Mtb) is the causative agent of tuberculosis (TB), an infectious disease that is a major killer worldwide. Due to selection pressure caused by the use of antibacterial drugs, Mtb is characterised by mutational events that have given rise to multi drug resistant (MDR) and extensively drug resistant (XDR) phenotypes. The rate at which mutations occur is an important factor in the study of molecular evolution, and it helps understand gene evolution. Within the same species, different protein-coding genes evolve at different rates. To estimate the rates of molecular evolution of protein-coding genes, a commonly used parameter is the ratio dN/dS, where dN is the rate of non-synonymous substitutions and dS is the rate of synonymous substitutions. Here, we determined the estimated rates of molecular evolution of select biological processes and molecular functions across 264 strains of Mtb. We also investigated the molecular evolutionary rates of core genes of Mtb by computing the dN/dS values, and estimated the pan genome of the 264 strains of Mtb. Our results show that the cellular amino acid metabolic process and the kinase activity function evolve at a significantly higher rate, while the carbohydrate metabolic process evolves at a significantly lower rate for M. tuberculosis. These high rates of evolution correlate well with Mtb physiology and pathogenicity. We further propose that the core genome of M. tuberculosis likely experiences varying rates of molecular evolution which may drive an interplay between core genome and accessory genome during M. tuberculosis evolution.

2.
J Pers Med ; 12(2)2022 Feb 11.
Artigo em Inglês | MEDLINE | ID: mdl-35207753

RESUMO

Genomics data are currently being produced at unprecedented rates, resulting in increased knowledge discovery and submission to public data repositories. Despite these advances, genomic information on African-ancestry populations remains significantly low compared with European- and Asian-ancestry populations. This information is typically segmented across several different biomedical data repositories, which often lack sufficient fine-grained structure and annotation to account for the diversity of African populations, leading to many challenges related to the retrieval, representation and findability of such information. To overcome these challenges, we developed the African Genomic Medicine Portal (AGMP), a database that contains metadata on genomic medicine studies conducted on African-ancestry populations. The metadata is curated from two public databases related to genomic medicine, PharmGKB and DisGeNET. The metadata retrieved from these source databases were limited to genomic variants that were associated with disease aetiology or treatment in the context of African-ancestry populations. Over 2000 variants relevant to populations of African ancestry were retrieved. Subsequently, domain experts curated and annotated additional information associated with the studies that reported the variants, including geographical origin, ethnolinguistic group, level of association significance and other relevant study information, such as study design and sample size, where available. The AGMP functions as a dedicated resource through which to access African-specific information on genomics as applied to health research, through querying variants, genes, diseases and drugs. The portal and its corresponding technical documentation, implementation code and content are publicly available.

3.
Front Genet ; 12: 758563, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34899843

RESUMO

Precision medicine has brought new hopes for patients around the world with the applications of novel technologies for understanding genetics of complex diseases and their translation into clinical services. Such applications however require a foundation of skills, knowledge and infrastructure to translate genetics for health care. The crucial element is no doubt the availability of genomics data for the target populations, which is seriously lacking for most parts of Africa. We discuss here why it is vital to prioritize genomics data for the South West Indian Ocean region where a mosaic of ethnicities co-exist. The islands of the SWIO, which comprise Madagascar, La Reunion, Mauritius, Seychelles and Comoros, have been the scene for major explorations and trade since the 17th century being on the route to Asia. This part of the world has lived through active passage of slaves from East Africa to Arabia and further. Today's demography of the islands is a diverse mix of ancestries including European, African and Asian. The extent of admixtures has yet to be resolved. Except for a few studies in Madagascar, there is very little published data on human genetics for these countries. Isolation and small population sizes have likely resulted in reduced genetic variation and possible founder effects. There is a significant prevalence of diabetes, particularly in individuals of Indian descent, while breast and prostate cancers are on the rise. The island of La Reunion is a French overseas territory with a high standard of health care and close ties to Mauritius. Its demography is comparable to that of Mauritius but with a predominantly mixed population and a smaller proportion of people of Indian descent. On the other hand, Madagascar's African descendants inhabit mostly the lower coastal zones of the West and South regions, while the upper highlands are occupied by peoples of mixed African-Indonesian ancestries. Historical records confirm the Austronesian contribution to the Madagascar genomes. With the rapid progress in genomic medicine, there is a growing demand for sequencing services in the clinical settings to explore the incidence of variants in candidate disease genes and other markers. Genome sequence data has become a priority in order to understand the population sub-structures and to identify specific pathogenic variants among the different groups of inhabitants on the islands. Genomic data is increasingly being used to advise families at risk and propose diagnostic screening measures to enhance the success of therapies. This paper discusses the complexity of the islands' populations and argues for the needs for genotyping and understanding the genetic factors associated with disease risks. The benefits to patients and improvement in health services through a concerted regional effort are depicted. Some private patients are having recourse to external facilities for molecular profiling with no return of data for research. Evidence of disease variants through sequencing represents a valuable source of medical data that can guide policy decisions at the national level. There are presently no such records for future implementation of strategies for genomic medicine.

4.
BMC Bioinformatics ; 22(1): 517, 2021 Oct 23.
Artigo em Inglês | MEDLINE | ID: mdl-34688246

RESUMO

BACKGROUND: The wealth of biological information available nowadays in public databases has triggered an unprecedented rise in multi-database search and data retrieval for obtaining detailed information about key functional and structural entities. This concerns investigations ranging from gene or genome analysis to protein structural analysis. However, the retrieval of interconnected data from a number of different databases is very often done repeatedly in an unsystematic way. RESULTS: Here, we present TAxonomy, Gene, Ontology, Protein, Structure INtegrated (TAGOPSIN), a command line program written in Java for rapid and systematic retrieval of select data from seven of the most popular public biological databases relevant to comparative genomics and protein structure studies. The program allows a user to retrieve organism-centred data and assemble them in a single data warehouse which constitutes a useful resource for several biological applications. TAGOPSIN was tested with a number of organisms encompassing eukaryotes, prokaryotes and viruses. For example, it successfully integrated data for about 17,000 UniProt entries of Homo sapiens and 21 UniProt entries of human coronavirus. CONCLUSION: TAGOPSIN demonstrates efficient data integration whereby manipulation of interconnected data is more convenient than doing multi-database queries. The program facilitates for instance interspecific comparative analyses of protein-coding genes in a molecular evolutionary study, or identification of taxa-specific protein domains and three-dimensional structures. TAGOPSIN is available as a JAR file at https://github.com/ebundhoo/TAGOPSIN and is released under the GNU General Public License.


Assuntos
Proteínas , Software , Biologia Computacional , Bases de Dados Factuais , Bases de Dados Genéticas , Genômica , Armazenamento e Recuperação da Informação , Interface Usuário-Computador
6.
PLoS One ; 15(11): e0242780, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33232371

RESUMO

As the genomic profile across cancers varies from person to person, patient prognosis and treatment may differ based on the mutational signature of each tumour. Thus, it is critical to understand genomic drivers of cancer and identify potential mutational commonalities across tumors originating at diverse anatomical sites. Large-scale cancer genomics initiatives, such as TCGA, ICGC and GENIE have enabled the analysis of thousands of tumour genomes. Our goal was to identify new cancer-causing mutations that may be common across tumour sites using mutational and gene expression profiles. Genomic and transcriptomic data from breast, ovarian, and prostate cancers were aggregated and analysed using differential gene expression methods to identify the effect of specific mutations on the expression of multiple genes. Mutated genes associated with the most differentially expressed genes were considered to be novel candidates for driver mutations, and were validated through literature mining, pathway analysis and clinical data investigation. Our driver selection method successfully identified 116 probable novel cancer-causing genes, with 4 discovered in patients having no alterations in any known driver genes: MXRA5, OBSCN, RYR1, and TG. The candidate genes previously not officially classified as cancer-causing showed enrichment in cancer pathways and in cancer diseases. They also matched expectations pertaining to properties of cancer genes, for instance, showing larger gene and protein lengths, and having mutation patterns suggesting oncogenic or tumor suppressor properties. Our approach allows for the identification of novel putative driver genes that are common across cancer sites using an unbiased approach without any a priori knowledge on pathways or gene interactions and is therefore an agnostic approach to the identification of putative common driver genes acting at multiple cancer sites.


Assuntos
Bases de Dados de Ácidos Nucleicos , Regulação Neoplásica da Expressão Gênica , Mutação , Proteínas Oncogênicas , Lesões Pré-Cancerosas , Transcriptoma , Feminino , Perfilação da Expressão Gênica , Humanos , Masculino , Proteínas Oncogênicas/biossíntese , Proteínas Oncogênicas/genética , Lesões Pré-Cancerosas/genética , Lesões Pré-Cancerosas/metabolismo
7.
Nature ; 586(7831): 741-748, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33116287

RESUMO

The African continent is regarded as the cradle of modern humans and African genomes contain more genetic variation than those from any other continent, yet only a fraction of the genetic diversity among African individuals has been surveyed1. Here we performed whole-genome sequencing analyses of 426 individuals-comprising 50 ethnolinguistic groups, including previously unsampled populations-to explore the breadth of genomic diversity across Africa. We uncovered more than 3 million previously undescribed variants, most of which were found among individuals from newly sampled ethnolinguistic groups, as well as 62 previously unreported loci that are under strong selection, which were predominantly found in genes that are involved in viral immunity, DNA repair and metabolism. We observed complex patterns of ancestral admixture and putative-damaging and novel variation, both within and between populations, alongside evidence that Zambia was a likely intermediate site along the routes of expansion of Bantu-speaking populations. Pathogenic variants in genes that are currently characterized as medically relevant were uncommon-but in other genes, variants denoted as 'likely pathogenic' in the ClinVar database were commonly observed. Collectively, these findings refine our current understanding of continental migration, identify gene flow and the response to human disease as strong drivers of genome-level population variation, and underscore the scientific imperative for a broader characterization of the genomic diversity of African individuals to understand human ancestry and improve health.


Assuntos
Variação Genética , Genoma Humano/genética , Genômica , Saúde , Migração Humana , África/etnologia , Reparo do DNA/genética , Conjuntos de Dados como Assunto , Feminino , Fluxo Gênico , Genética Médica , Genética Populacional , Saúde/história , História Antiga , Migração Humana/história , Humanos , Imunidade/genética , Idioma , Masculino , Metabolismo/genética , Seleção Genética , Sequenciamento Completo do Genoma
8.
Proteins ; 85(3): 463-469, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-27701764

RESUMO

Many of the modeling targets in the blind CASP-11/CAPRI-30 experiment were protein homo-dimers and homo-tetramers. Here, we perform a retrospective docking-based analysis of the perfectly symmetrical CAPRI Round 30 targets whose crystal structures have been published. Starting from the CASP "stage-2" fold prediction models, we show that using our recently developed "SAM" polar Fourier symmetry docking algorithm combined with NAMD energy minimization often gives acceptable or better 3D models of the target complexes. We also use SAM to analyze the overall quality of all CASP structural models for the selected targets from a docking-based perspective. We demonstrate that docking only CASP "center" structures for the selected targets provides a fruitful and economical docking strategy. Furthermore, our results show that many of the CASP models are dockable in the sense that they can lead to acceptable or better models of symmetrical complexes. Even though SAM is very fast, using docking and NAMD energy minimization to pull out acceptable docking models from a large ensemble of docked CASP models is computationally expensive. Nonetheless, thanks to our SAM docking algorithm, we expect that applying our docking protocol on a modern computer cluster will give us the ability to routinely model 3D structures of symmetrical protein complexes from CASP-quality models. Proteins 2017; 85:463-469. © 2016 Wiley Periodicals, Inc.


Assuntos
Algoritmos , Biologia Computacional/métodos , Simulação de Acoplamento Molecular/métodos , Proteínas/química , Software , Motivos de Aminoácidos , Benchmarking , Sítios de Ligação , Cristalografia por Raios X , Ligação Proteica , Conformação Proteica , Mapeamento de Interação de Proteínas , Multimerização Proteica , Projetos de Pesquisa , Homologia Estrutural de Proteína , Termodinâmica
9.
Methods Mol Biol ; 1415: 91-105, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27115629

RESUMO

Comparing and classifying protein domain interactions according to their three-dimensional (3D) structures can help to understand protein structure-function and evolutionary relationships. Additionally, structural knowledge of existing domain-domain interactions can provide a useful way to find structural templates with which to model the 3D structures of unsolved protein complexes. Here we present a straightforward guide to using the "Kbdock" protein domain structure database and its associated web site for exploring and comparing protein domain-domain interactions (DDIs) and domain-peptide interactions (DPIs) at the Pfam domain family level. We also briefly explain how the Kbdock web site works, and we provide some notes and suggestions which should help to avoid some common pitfalls when working with 3D protein domain structures.


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Proteínas/metabolismo , Internet , Modelos Moleculares , Simulação de Acoplamento Molecular , Ligação Proteica , Domínios Proteicos , Domínios e Motivos de Interação entre Proteínas , Mapas de Interação de Proteínas , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína
10.
Biology (Basel) ; 4(2): 327-43, 2015 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-25860777

RESUMO

While the number of solved 3D protein structures continues to grow rapidly, the structural rules that distinguish protein-protein interactions between different structural families are still not clear. Here, we classify and analyse the secondary structural features and promiscuity of a comprehensive non-redundant set of domain family binding sites (DFBSs) and hetero domain-domain interactions (DDIs) extracted from our updated KBDOCK resource. We have partitioned 4001 DFBSs into five classes using their propensities for three types of secondary structural elements ("α" for helices, "ß" for strands, and "γ" for irregular structure) and we have analysed how frequently these classes occur in DDIs. Our results show that ß elements are not highly represented in DFBSs compared to α and γ elements. At the DDI level, all classes of binding sites tend to preferentially bind to the same class of binding sites and α/ß contacts are significantly disfavored. Very few DFBSs are promiscuous: 80% of them interact with just one Pfam domain. About 50% of our Pfam domains bear only one single-partner DFBS and are therefore monogamous in their interactions with other domains. Conversely, promiscuous Pfam domains bear several DFBSs among which one or two are promiscuous, thereby multiplying the promiscuity of the concerned protein.

11.
Nucleic Acids Res ; 42(Database issue): D389-95, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24271397

RESUMO

Comparing, classifying and modelling protein structural interactions can enrich our understanding of many biomolecular processes. This contribution describes Kbdock (http://kbdock.loria.fr/), a database system that combines the Pfam domain classification with coordinate data from the PDB to analyse and model 3D domain-domain interactions (DDIs). Kbdock can be queried using Pfam domain identifiers, protein sequences or 3D protein structures. For a given query domain or pair of domains, Kbdock retrieves and displays a non-redundant list of homologous DDIs or domain-peptide interactions in a common coordinate frame. Kbdock may also be used to search for and visualize interactions involving different, but structurally similar, Pfam families. Thus, structural DDI templates may be proposed even when there is little or no sequence similarity to the query domains.


Assuntos
Bases de Dados de Proteínas , Domínios e Motivos de Interação entre Proteínas , Sítios de Ligação , Internet , Modelos Moleculares , Simulação de Acoplamento Molecular , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Proteínas/classificação , Alinhamento de Sequência , Análise de Sequência de Proteína
12.
Proteins ; 81(12): 2150-8, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24123156

RESUMO

Protein docking algorithms aim to calculate the three-dimensional (3D) structure of a protein complex starting from its unbound components. Although ab initio docking algorithms are improving, there is a growing need to use homology modeling techniques to exploit the rapidly increasing volumes of structural information that now exist. However, most current homology modeling approaches involve finding a pair of complete single-chain structures in a homologous protein complex to use as a 3D template, despite the fact that protein complexes are often formed from one or more domain-domain interactions (DDIs). To model 3D protein complexes by domain-domain homology, we have developed a case-based reasoning approach called KBDOCK which systematically identifies and reuses domain family binding sites from our database of nonredundant DDIs. When tested on 54 protein complexes from the Protein Docking Benchmark, our approach provides a near-perfect way to model single-domain protein complexes when full-homology templates are available, and it extends our ability to model more difficult cases when only partial or incomplete templates exist. These promising early results highlight the need for a new and diverse docking benchmark set, specifically designed to assess homology docking approaches.


Assuntos
Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Algoritmos , Sítios de Ligação , Bases de Dados de Proteínas , Modelos Moleculares , Simulação de Acoplamento Molecular , Linguagens de Programação , Conformação Proteica , Software
13.
Bioinformatics ; 28(24): 3274-81, 2012 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-23093609

RESUMO

MOTIVATION: Aligning and comparing protein structures is important for understanding their evolutionary and functional relationships. With the rapid growth of protein structure databases in recent years, the need to align, superpose and compare protein structures rapidly and accurately has never been greater. Many structural alignment algorithms have been described in the past 20 years. However, achieving an algorithm that is both accurate and fast remains a considerable challenge. RESULTS: We have developed a novel protein structure alignment algorithm called 'Kpax', which exploits the highly predictable covalent geometry of C(α) atoms to define multiple local coordinate frames in which backbone peptide fragments may be oriented and compared using sensitive Gaussian overlap scoring functions. A global alignment and hence a structural superposition may then be found rapidly using dynamic programming with secondary structure-specific gap penalties. When superposing pairs of structures, Kpax tends to give tighter secondary structure overlays than several popular structure alignment algorithms. When searching the CATH database, Kpax is faster and more accurate than the very efficient Yakusa algorithm, and it gives almost the same high level of fold recognition as TM-Align while being more than 100 times faster.


Assuntos
Algoritmos , Peptídeos/química , Homologia Estrutural de Proteína , Bases de Dados de Proteínas , Modelos Moleculares , Distribuição Normal , Estrutura Secundária de Proteína , Proteínas/química
14.
Proteins ; 80(2): 530-45, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22081520

RESUMO

The question of how best to compare and classify the (three-dimensional) structures of proteins is one of the most important unsolved problems in computational biology. To help tackle this problem, we have developed a novel shape-density superposition algorithm called 3D-Blast which represents and superposes the shapes of protein backbone folds using the spherical polar Fourier correlation technique originally developed by us for protein docking. The utility of this approach is compared with several well-known protein structure alignment algorithms using receiver-operator-characteristic plots of queries against the "gold standard" CATH database. Despite being completely independent of protein sequences and using no information about the internal geometry of proteins, our results from searching the CATH database show that 3D-Blast is highly competitive compared to current state-of-the-art protein structure alignment algorithms. A novel and potentially very useful feature of our approach is that it allows an average or "consensus" fold to be calculated easily for a given group of protein structures. We find that using consensus shapes to represent entire fold families also gives very good database query performance. We propose that using the notion of consensus fold shapes could provide a powerful new way to index existing protein structure databases, and that it offers an objective way to cluster and classify all of the currently known folds in the protein universe.


Assuntos
Algoritmos , Modelos Moleculares , Dobramento de Proteína , Proteínas/química , Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteínas/classificação , Alinhamento de Sequência/métodos , Homologia Estrutural de Proteína
15.
Bioinformatics ; 27(20): 2820-7, 2011 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-21873637

RESUMO

MOTIVATION: In recent years, much structural information on protein domains and their pair-wise interactions has been made available in public databases. However, it is not yet clear how best to use this information to discover general rules or interaction patterns about structural protein-protein interactions. Improving our ability to detect and exploit structural interaction patterns will help to provide a better 3D picture of the known protein interactome, and will help to guide docking-based predictions of the 3D structures of unsolved protein complexes. RESULTS: This article presents KBDOCK, a 3D database approach for spatially clustering protein binding sites and for performing template-based (knowledge-based) protein docking. KBDOCK combines residue contact information from the 3DID database with the Pfam protein domain family classification together with coordinate data from the Protein Data Bank. This allows the 3D configurations of all known hetero domain-domain interactions to be superposed and clustered for each Pfam family. We find that most Pfam domain families have up to four hetero binding sites, and over 60% of all domain families have just one hetero binding site. The utility of this approach for template-based docking is demonstrated using 73 complexes from the Protein Docking Benchmark. Overall, up to 45 out of 73 complexes may be modelled by direct homology to existing domain interfaces, and key binding site information is found for 24 of the 28 remaining complexes. These results show that KBDOCK can often provide useful information for predicting the structures of unknown protein complexes. AVAILABILITY: http://kbdock.loria.fr/ CONTACT: Dave.Ritchie@inria.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas/métodos , Sítios de Ligação , Bases de Dados de Proteínas , Humanos , Modelos Moleculares , Complexos Multiproteicos/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA